SEAL: a distributed short read mapping and duplicate removal tool

نویسندگان

  • Luca Pireddu
  • Simone Leo
  • Gianluigi Zanetti
چکیده

SUMMARY SEAL is a scalable tool for short read pair mapping and duplicate removal. It computes mappings that are consistent with those produced by BWA and removes duplicates according to the same criteria employed by Picard MarkDuplicates. On a 16-node Hadoop cluster, it is capable of processing about 13 GB per hour in map+rmdup mode, while reaching a throughput of 19 GB per hour in mapping-only mode. AVAILABILITY SEAL is available online at http://biodoop-seal.sourceforge.net/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster

With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow t...

متن کامل

Accurate estimation of short read mapping quality for next-generation genome sequencing

MOTIVATION Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment-in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities ...

متن کامل

An approach to transcriptome analysis of non-model organisms using short-read sequences.

Transcriptome analysis using high-throughput short-read sequencing technology is straightforward when the sequenced genome is the same species or extremely similar to the reference genome. We present an analysis approach for when the sequenced organism does not have an already sequenced genome that can be used for a reference, as will be the case of many non-model organisms. As proof of concept...

متن کامل

Filtering duplicate reads from 454 pyrosequencing data

MOTIVATION Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the e...

متن کامل

Updates to the RMAP short-read mapping software

SUMMARY We report on a major new version of the RMAP software for mapping reads from short-read sequencing technology. General improvements to accuracy and space requirements are included, along with novel functionality. Included in the RMAP software package are tools for mapping paired-end reads, mapping using more sophisticated use of quality scores, collecting ambiguous mapping locations and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2011